Skip to content

Conversation

@owenowenisme
Copy link
Member

@owenowenisme owenowenisme commented Jan 7, 2026

Description

Make the Union operator not blocking when preserve_order is enabled if _add_input_inner is called with the input in the front.

Signed-off-by: You-Cheng Lin <[email protected]>
@owenowenisme owenowenisme added alpha Alpha release features data Ray Data-related issues go add ONLY when ready to merge, run all tests and removed alpha Alpha release features labels Jan 7, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a blocking issue in the Union operator when preserve_order is enabled. The change introduces a streaming approach, flushing data from input operators as they complete, which is a good improvement. My review identifies a critical potential for deadlock because the flushing logic isn't triggered when an input stream finishes without sending a final block. I've also included a couple of suggestions to improve code readability and maintainability by reducing duplication and simplifying expressions. Addressing the critical issue is necessary to prevent hangs in the data processing pipeline.

Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
@owenowenisme owenowenisme marked this pull request as ready for review January 7, 2026 16:05
@owenowenisme owenowenisme requested a review from a team as a code owner January 7, 2026 16:05
Copy link
Contributor

@alexeykudinkin alexeykudinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owenowenisme the problem is that we still we're exhausting 1 input before we move on to the next one, which means we're gonna be accumulating remaining outputs before producing.

Instead, please take a look how we achieve determinism in make_async_gen and apply the same technique here:

  • Iterate over inputs in the same order
  • Always deque 1 block and never skip an input!
  • Once op completes you can start skipping it

@owenowenisme
Copy link
Member Author

@alexeykudinkin
Wait, are you saying that we round-robin the input ops of Union? I think Union should not behave like make_async_gen, because Union's documented behavior is concatenation (ds1.union(ds2) → all of ds1, then all of ds2), while round-robin would produce interleaved output. These are different orderings.

Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
@owenowenisme owenowenisme force-pushed the data/make_union_not_block_when_preserve_order_true branch from e4e4007 to da8d1b5 Compare January 21, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants